{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 模型参数的访问、初始化和共享\n",
    "\n",
    "在[“线性回归的简洁实现”](../chapter_deep-learning-basics/linear-regression-gluon.ipynb)一节中，我们通过`init`模块来初始化模型的全部参数。我们也介绍了访问模型参数的简单方法。本节将深入讲解如何访问和初始化模型参数，以及如何在多个层之间共享同一份模型参数。\n",
    "\n",
    "我们先定义一个与上一节中相同的含单隐藏层的多层感知机。我们依然使用默认方式初始化它的参数，并做一次前向计算。与之前不同的是，在这里我们从MXNet中导入了`init`模块，它包含了多种模型初始化方法。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "1"
    }
   },
   "outputs": [],
   "source": [
    "from mxnet import init, nd\n",
    "from mxnet.gluon import nn\n",
    "\n",
    "net = nn.Sequential()\n",
    "net.add(nn.Dense(256, activation='relu'))\n",
    "net.add(nn.Dense(10))\n",
    "net.initialize()  # 使用默认初始化方式\n",
    "\n",
    "X = nd.random.uniform(shape=(2, 20))\n",
    "Y = net(X)  # 前向计算"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 访问模型参数\n",
    "\n",
    "对于使用`Sequential`类构造的神经网络，我们可以通过方括号`[]`来访问网络的任一层。回忆一下上一节中提到的`Sequential`类与`Block`类的继承关系。对于`Sequential`实例中含模型参数的层，我们可以通过`Block`类的`params`属性来访问该层包含的所有参数。下面，访问多层感知机`net`中隐藏层的所有参数。索引0表示隐藏层为`Sequential`实例最先添加的层。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "2"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(dense0_ (\n",
       "   Parameter dense0_weight (shape=(256, 20), dtype=float32)\n",
       "   Parameter dense0_bias (shape=(256,), dtype=float32)\n",
       " ), mxnet.gluon.parameter.ParameterDict)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net[0].params, type(net[0].params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到，我们得到了一个由参数名称映射到参数实例的字典（类型为`ParameterDict`类）。其中权重参数的名称为`dense0_weight`，它由`net[0]`的名称（`dense0_`）和自己的变量名（`weight`）组成。而且可以看到，该参数的形状为(256, 20)，且数据类型为32位浮点数（`float32`）。为了访问特定参数，我们既可以通过名字来访问字典里的元素，也可以直接使用它的变量名。下面两种方法是等价的，但通常后者的代码可读性更好。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "3"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(Parameter dense0_weight (shape=(256, 20), dtype=float32),\n",
       " Parameter dense0_weight (shape=(256, 20), dtype=float32))"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net[0].params['dense0_weight'], net[0].weight"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Gluon里参数类型为`Parameter`类，它包含参数和梯度的数值，可以分别通过`data`函数和`grad`函数来访问。因为我们随机初始化了权重，所以权重参数是一个由随机数组成的形状为(256, 20)的`NDArray`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "4"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[ 0.06700657 -0.00369488  0.0418822  ... -0.05517294 -0.01194733\n",
       "  -0.00369594]\n",
       " [-0.03296221 -0.04391347  0.03839272 ...  0.05636378  0.02545484\n",
       "  -0.007007  ]\n",
       " [-0.0196689   0.01582889 -0.00881553 ...  0.01509629 -0.01908049\n",
       "  -0.02449339]\n",
       " ...\n",
       " [ 0.00010955  0.0439323  -0.04911506 ...  0.06975312  0.0449558\n",
       "  -0.03283203]\n",
       " [ 0.04106557  0.05671307 -0.00066976 ...  0.06387014 -0.01292654\n",
       "   0.00974177]\n",
       " [ 0.00297424 -0.0281784  -0.06881659 ... -0.04047417  0.00457048\n",
       "   0.05696651]]\n",
       "<NDArray 256x20 @cpu(0)>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net[0].weight.data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "权重梯度的形状和权重的形状一样。因为我们还没有进行反向传播计算，所以梯度的值全为0。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "5"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[0. 0. 0. ... 0. 0. 0.]\n",
       " [0. 0. 0. ... 0. 0. 0.]\n",
       " [0. 0. 0. ... 0. 0. 0.]\n",
       " ...\n",
       " [0. 0. 0. ... 0. 0. 0.]\n",
       " [0. 0. 0. ... 0. 0. 0.]\n",
       " [0. 0. 0. ... 0. 0. 0.]]\n",
       "<NDArray 256x20 @cpu(0)>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net[0].weight.grad()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "类似地，我们可以访问其他层的参数，如输出层的偏差值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "6"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n",
       "<NDArray 10 @cpu(0)>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net[1].bias.data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最后，我们可以使用`collect_params`函数来获取`net`变量所有嵌套（例如通过`add`函数嵌套）的层所包含的所有参数。它返回的同样是一个由参数名称到参数实例的字典。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "7"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sequential0_ (\n",
       "  Parameter dense0_weight (shape=(256, 20), dtype=float32)\n",
       "  Parameter dense0_bias (shape=(256,), dtype=float32)\n",
       "  Parameter dense1_weight (shape=(10, 256), dtype=float32)\n",
       "  Parameter dense1_bias (shape=(10,), dtype=float32)\n",
       ")"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net.collect_params()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这个函数可以通过正则表达式来匹配参数名，从而筛选需要的参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "8"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sequential0_ (\n",
       "  Parameter dense0_weight (shape=(256, 20), dtype=float32)\n",
       "  Parameter dense1_weight (shape=(10, 256), dtype=float32)\n",
       ")"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net.collect_params('.*weight')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 初始化模型参数\n",
    "\n",
    "我们在[“数值稳定性和模型初始化”](../chapter_deep-learning-basics/numerical-stability-and-init.ipynb)一节中描述了模型的默认初始化方法：权重参数元素为[-0.07, 0.07]之间均匀分布的随机数，偏差参数则全为0。但我们经常需要使用其他方法来初始化权重。MXNet的`init`模块里提供了多种预设的初始化方法。在下面的例子中，我们将权重参数初始化成均值为0、标准差为0.01的正态分布随机数，并依然将偏差参数清零。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "9"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[ 0.01074176  0.00066428  0.00848699 -0.0080038  -0.00168822  0.00936328\n",
       "  0.00357444  0.00779328 -0.01010307 -0.00391573  0.01316619 -0.00432926\n",
       "  0.0071536   0.00925416 -0.00904951 -0.00074684  0.0082254  -0.01878511\n",
       "  0.00885884  0.01911872]\n",
       "<NDArray 20 @cpu(0)>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 非首次对模型初始化需要指定force_reinit为真\n",
    "net.initialize(init=init.Normal(sigma=0.01), force_reinit=True)\n",
    "net[0].weight.data()[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面使用常数来初始化权重参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "10"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n",
       "<NDArray 20 @cpu(0)>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net.initialize(init=init.Constant(1), force_reinit=True)\n",
    "net[0].weight.data()[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果只想对某个特定参数进行初始化，我们可以调用`Parameter`类的`initialize`函数，它与`Block`类提供的`initialize`函数的使用方法一致。下例中我们对隐藏层的权重使用Xavier随机初始化方法。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "11"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[ 0.00512482 -0.06579044 -0.10849719 -0.09586414  0.06394844  0.06029618\n",
       " -0.03065033 -0.01086642  0.01929168  0.1003869  -0.09339568 -0.08703034\n",
       " -0.10472868 -0.09879824 -0.00352201 -0.11063069 -0.04257748  0.06548801\n",
       "  0.12987629 -0.13846186]\n",
       "<NDArray 20 @cpu(0)>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net[0].weight.initialize(init=init.Xavier(), force_reinit=True)\n",
    "net[0].weight.data()[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 自定义初始化方法\n",
    "\n",
    "有时候我们需要的初始化方法并没有在`init`模块中提供。这时，可以实现一个`Initializer`类的子类，从而能够像使用其他初始化方法那样使用它。通常，我们只需要实现`_init_weight`这个函数，并将其传入的`NDArray`修改成初始化的结果。在下面的例子里，我们令权重有一半概率初始化为0，有另一半概率初始化为$[-10,-5]$和$[5,10]$两个区间里均匀分布的随机数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "12"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Init dense0_weight (256, 20)\n",
      "Init dense1_weight (10, 256)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "\n",
       "[-5.3659673  7.5773945  8.986376  -0.         8.827555   0.\n",
       "  5.9840508 -0.         0.         0.         7.4857597 -0.\n",
       " -0.         6.8910007  6.9788704 -6.1131554  0.         5.4665203\n",
       " -9.735263   9.485172 ]\n",
       "<NDArray 20 @cpu(0)>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "class MyInit(init.Initializer):\n",
    "    def _init_weight(self, name, data):\n",
    "        print('Init', name, data.shape)\n",
    "        data[:] = nd.random.uniform(low=-10, high=10, shape=data.shape)\n",
    "        data *= data.abs() >= 5\n",
    "\n",
    "net.initialize(MyInit(), force_reinit=True)\n",
    "net[0].weight.data()[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此外，我们还可以通过`Parameter`类的`set_data`函数来直接改写模型参数。例如，在下例中我们将隐藏层参数在现有的基础上加1。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "13"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[-4.3659673  8.5773945  9.986376   1.         9.827555   1.\n",
       "  6.9840508  1.         1.         1.         8.48576    1.\n",
       "  1.         7.8910007  7.9788704 -5.1131554  1.         6.4665203\n",
       " -8.735263  10.485172 ]\n",
       "<NDArray 20 @cpu(0)>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net[0].weight.set_data(net[0].weight.data() + 1)\n",
    "net[0].weight.data()[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 共享模型参数\n",
    "\n",
    "在有些情况下，我们希望在多个层之间共享模型参数。[“模型构造”](model-construction.ipynb)一节介绍了如何在`Block`类的`forward`函数里多次调用同一个层来计算。这里再介绍另外一种方法，它在构造层的时候指定使用特定的参数。如果不同层使用同一份参数，那么它们在前向计算和反向传播时都会共享相同的参数。在下面的例子里，我们让模型的第二隐藏层（`shared`变量）和第三隐藏层共享模型参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "14"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[1. 1. 1. 1. 1. 1. 1. 1.]\n",
       "<NDArray 8 @cpu(0)>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net = nn.Sequential()\n",
    "shared = nn.Dense(8, activation='relu')\n",
    "net.add(nn.Dense(8, activation='relu'),\n",
    "        shared,\n",
    "        nn.Dense(8, activation='relu', params=shared.params),\n",
    "        nn.Dense(10))\n",
    "net.initialize()\n",
    "\n",
    "X = nd.random.uniform(shape=(2, 20))\n",
    "net(X)\n",
    "\n",
    "net[1].weight.data()[0] == net[2].weight.data()[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们在构造第三隐藏层时通过`params`来指定它使用第二隐藏层的参数。因为模型参数里包含了梯度，所以在反向传播计算时，第二隐藏层和第三隐藏层的梯度都会被累加在`shared.params.grad()`里。\n",
    "\n",
    "\n",
    "## 小结\n",
    "\n",
    "* 有多种方法来访问、初始化和共享模型参数。\n",
    "* 可以自定义初始化方法。\n",
    "\n",
    "\n",
    "## 练习\n",
    "\n",
    "* 查阅有关`init`模块的MXNet文档，了解不同的参数初始化方法。\n",
    "* 尝试在`net.initialize()`后、`net(X)`前访问模型参数，观察模型参数的形状。\n",
    "* 构造一个含共享参数层的多层感知机并训练。在训练过程中，观察每一层的模型参数和梯度。\n",
    "\n",
    "\n",
    "\n",
    "## 扫码直达[讨论区](https://discuss.gluon.ai/t/topic/987)\n",
    "\n",
    "![](../img/qr_parameters.svg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
